Choosing the right collection in .NET

Published on 27 Jun 20204 min read0 Comments

As a developer, we usually have to choose the right collection for the task at hand, considering each collection's time complexity in a given scenario. Be aware that each collection is suitable for a specific context. In this post, I will be covering the data structures that are available as a built-in component in .NET.

Why choosing the right collection is important?

The answer to this question is pretty straightforward: We shouldn't be using a tool just because it's capable of handling the job. When choosing a data structure, it's important to consider how it will be used and the complexity of the operations that will be performed against it.

One should consider the proper data structure because it will not only improve code quality and readability but also it will avoid hidden performance issues for a specific scenario.

Features vs performance - Tips & Tricks

In the interest of keeping this blog post short, I'm not going to dive into the performance of each collection available in .NET. I will, however, offer general guidance when it comes to making decisions where the benefits outweigh the drawbacks of your implementation.

If you are interested in reading more about features and time complexity for each collection and their corresponding API, there is a lot of great documentation already, so please refer to the following list:

Avoiding performance issues

The best example that comes to mind is when we use List<T>, where List<T>.Add(...) has O(1) amortized time, which is nice, but List<T>.Insert(...) and List<T>.Remove(...) has O(n) complexity.

During these operations, the elements after the target index are shifted accordingly in order to reorganize the dynamic array. That said, writing other than at the end of the collection can be computationally expensive and an innocent Insert can be a potential source of performance issues.

When we understand each collection works, we can anticipate these potential issues.

Ensuring proper usage (and readability as added bonus)

When we use a more dedicated collection, we explicitly specify how it should be used for the specific scenario. A mere declaration naturally reduces the set of usage possibilities and keeps its readability clear.

For instance, if someone inspects a code he doesn't know and finds a class that manipulates a Queue<T>, it becomes naturally clear that this attribute is not inserting or removing elements in the middle of the collection. It means also that the first element added is always the first to be read and that at some point, the element is probably removed after its value is obtained.

All that information by reading one single line of code!

That said, it becomes clear that if we used a List<T> in a scenario where a Queue<T> would be a fit, even if the performance matches, we lose in readability by not using the right tool, thus compromising its interpretation.

There is no such thing as a free lunch

There is no data structure that will fit better in every context. In order to select a proper collection, you need to know what you can give away.

A dictionary, hashset or hashtable will give you O(1) amortized complexity when inserting, adding or removing items, but you can't guarantee order of the elements.
A binary search tree will give you O(log n) when inserting or removing and a predictable order. It's far more efficient than a O(n) for big sets, but it's not as fast as a dictionary.

That said, if you need fast access and you don't need order, the "limitations" of a dictionary is not even an issue. However, if you need order and elements are constantly entering and leaving at unpredictable positions, most search trees will have O(log n) complexity for both writing and retrieval, which is pretty decent, but not as fast as the dictionary's amortized constant time.

Design considerations

Once we have chosen the collection that suits best our specification, we need to make sure that our implementation does not leak to the consumer. For example, if we want to keep tight control over what/how/when elements are added, inserted or removed from the collection of our choice, providing read-only access is usually a good decision.

A word of caution

We shouldn't use a data structure just because it's capable of handling the job. It's important to consider the scenario and the operations that will be performed. The understanding of how things works under the hood helps us make the judgement of which data structure will perform better and ensure application's performance. It will also improve code quality and readability.